Optimal Recovery Schemes in Distributed COMPUTING
نویسنده
چکیده
Clusters and distributed systems offer fault tolerance and high performance through load sharing, and are thus attractive in real-time applications. When all computers are up and running, we would like the load to be evenly distributed among the computers. When one or more computers fail this must be redistributed. The redistribution is determined by the recovery scheme. The recovery scheme should keep the load as evenly distributed as possible even when the most unfavorable combinations of computers break down, i.e. we want to optimize the worst-case behavior. In this paper we compared all schemes (Modulo ruler, Golomb ruler, Greedy Sequence, Sloane Sequence, Log Sequence) with worst-case behavior. Finally we conclude our scheme (Sloane schemes) performs better than all the other schemes.
منابع مشابه
Recovery Schemes for High Availability and High Performance Cluster Computing
Clusters and distributed systems offer two important advantages, viz. fault tolerance and high performance through load sharing. When all computers are up and running, we would like the load to be evenly distributed among the computers. When one or more computers break down the load on these computers must be redistributed to other computers in the cluster. The redistribution is determined by t...
متن کاملDistributed Recovery Units: An Approach for Hybrid and Adaptive Distributed Recovery
Traditionally, distributed recovery schemes have been designed for systems consisting of multiple recovery units. Each recovery unit (RU) resides on a single processor and it can fail and recover as a whole. This report introduces the \distributed recovery unit (DRU)" abstraction as an approach for design of \hybrid" and \adaptive" recovery schemes for distributed systems. The distributed syste...
متن کاملStraggler Mitigation in Distributed Matrix Multiplication: Fundamental Limits and Optimal Coding
We consider the problem of massive matrix multiplication, which underlies many data analytic applications, in a large-scale distributed system comprising a group of worker nodes. We target the stragglers’ delay performance bottleneck, which is due to the unpredictable latency in waiting for slowest nodes (or stragglers) to finish their tasks. We propose a novel coding strategy, named entangled ...
متن کاملA Survey and Performance Analysis of Checkpointing and Recovery Schemes for Mobile Computing Systems
A SURVEY AND PERFORMANCE ANALYSIS OF CHECKPOINTING AND RECOVERY SCHEMES FOR MOBILE COMPUTING SYSTEMS Ruchi Tuli1 and Parveen Kumar2 1Yanbu University College, Royal Commission for Jubail and Yanbu, Directorate General for Yanbu, P.O. Box 30436 Madinat Yanbu Al Sinaiyah Kingdom of Saudi Arabia., E-mail : [email protected] 2Merrut Institute of Engineering and Technology, Merrut (INDIA) E-mail ...
متن کاملA Case for Multi-Level Distributed Recovery Schemes
Most of the distributed recovery schemes proposed in the literature are designed to tolerate arbitrary number of failures, with a few notable exceptions of schemes designed to tolerate single failures. In this report, we demonstrate that, it is often advantageous to use \multi-level" recovery schemes. A \multi-level" recovery scheme is one that can tolerate diierent number of faults at diierent...
متن کامل